“Every Newborn-BIRTH” protocol: observational study validating indicators for coverage and quality of maternal and newborn health care in Bangladesh, Nepal and Tanzania

Background To achieve Sustainable Development Goals and Universal Health Coverage, programmatic data are essential. The Every Newborn Action Plan, agreed by all United Nations member states and >80 development partners, includes an ambitious Measurement Improvement Roadmap. Quality of care at birth is prioritised by both Every Newborn and Ending Preventable Maternal Mortality strategies, hence metrics need to advance from health service contact alone, to content of care. As facility births increase, monitoring using routine facility data in DHIS2 has potential, yet validation research has mainly focussed on maternal recall surveys. The Every Newborn – Birth Indicators Research Tracking in Hospitals (EN-BIRTH) study aims to validate selected newborn and maternal indicators for routine tracking of coverage and quality of facility-based care for use at district, national and global levels. Methods EN-BIRTH is an observational study including >20 000 facility births in three countries (Tanzania, Bangladesh and Nepal) to validate selected indicators. Direct clinical observation will be compared with facility register data and a pre-discharge maternal recall survey for indicators including: uterotonic administration, immediate newborn care, neonatal resuscitation and Kangaroo mother care. Indicators including neonatal infection management and antenatal corticosteroid administration, which cannot be easily observed, will be validated using inpatient records. Trained clinical observers in Labour/Delivery ward, Operation theatre, and Kangaroo mother care ward/areas will collect data using a tablet-based customised data capturing application. Sensitivity will be calculated for numerators of all indicators and specificity for those numerators with adequate information. Other objectives include comparison of denominator options (ie, true target population or surrogates) and quality of care analyses, especially regarding intervention timing. Barriers and enablers to routine recording and data usage will be assessed by data flow assessments, quantitative and qualitative analyses. Conclusions To our knowledge, this is the first large, multi-country study validating facility-based routine data compared to direct observation for maternal and newborn care, designed to provide evidence to inform selection of a core list of indicators recommended for inclusion in national DHIS2. Availability and use of such data are fundamental to drive progress towards ending the annual 5.5 million preventable stillbirths, maternal and newborn deaths.

Valid data and measurement are central to achieving the Sustainable Development Goal (SDG) aspiration of "no-one left behind" [1]. In the United Nation' s Global Strategy for Women' s Children' s and Adolescent' s Health the ongoing imperative for the right to survive, is joined by a new focus on thriving, with wider transformation [2]. Progress for survival has been slowest for the 5.5 million deaths of women and babies around the time of birth each year, including an estimated 2.5 million newborns dying in the first 28 days of life, 2.6 million babies stillborn and 303 000 maternal deaths [3][4][5]. Most of these deaths happen to the poorest families in the poorest countries, and most are preventable [6]. Opportunity exists to save an estimated 3 million lives per year by improving quality of care at birth and care of small and sick newborns [7,8]. Based on this evidence, the Every Newborn Action Plan (ENAP) was launched in 2014 and endorsed by all member states in a World Health Assembly resolution [9]. The plan outlines 2030 country targets of 12 or fewer newborn deaths per 1000 live births and 12 or fewer stillbirths per 1000 total births. Every Newborn is closely aligned with the World Health Organization (WHO) Strategy for Ending Preventable Maternal Mortality (EPMM) [10] since both include a priority for quality of care at birth alongside the Quality, Equity, Dignity movement led by WHO, UNICEF and UNFPA in 11 countries, aiming to halve facility deaths by 2020 [11].
Accurate data are essential to drive progress towards these targets. However, at the dawn of the SDG era, most deaths around the time of birth still occur in settings with the least data on coverage and quality of care -the "inverse data law" [12]. One of five strategic objectives of Every Newborn is to transform measurement and use of data to track coverage and quality of care [8,9,13]. A top priority has been to develop and implement a time-limited plan to ensure required core indicators are validated and feasible to measure at scale. In support, WHO and the London School of Hygiene & Tropical Medicine (LSHTM) have coordinated an ambitious Measurement Improvement Roadmap which reviews specific measurement gaps and provides a multi-year, multi-partner pathway to define specific indicators, test validity if needed, develop tools, and promote use of data by 2020 [14][15][16].
Background To achieve Sustainable Development Goals and Universal Health Coverage, programmatic data are essential. The Every Newborn Action Plan, agreed by all United Nations member states and >80 development partners, includes an ambitious Measurement Improvement Roadmap. Quality of care at birth is prioritised by both Every Newborn and Ending Preventable Maternal Mortality strategies, hence metrics need to advance from health service contact alone, to content of care. As facility births increase, monitoring using routine facility data in DHIS2 has potential, yet validation research has mainly focussed on maternal recall surveys. The Every Newborn -Birth Indicators Research Tracking in Hospitals (EN-BIRTH) study aims to validate selected newborn and maternal indicators for routine tracking of coverage and quality of facility-based care for use at district, national and global levels.

VIEWPOINTS RESEARCH THEME 5: MEASURING COVERAGE OF ESSENTIAL MATERNAL AND NEWBORN CARE INTERVENTIONS: AN UNFINISHED AGENDA
Ten core indicators were prioritised as part of the Every Newborn multi-country consultation process including those for impact, coverage and input (Figure 1) [9,16,17]. This protocol relates to the coverage indicators shown in the middle of Figure 1. Indicators of coverage of care for all women and newborns are shaded amber, because whilst definitions are clear, content and quality of care data requires improvement. The greatest metrics gap is core coverage indicators for specific, high impact interventions, shown in red in Figure 1. The combination of core indicators for Every Newborn and EPMM is illustrated in Figure 2 and approximately half of these indicators are the same [10]. Validating the highest priority indicators, highlighted in red in Figure 2, is the topic of this research: all women to receive uterotonics and   newborns with complications to receive neonatal resuscitation, Kangaroo mother care (KMC), treatment for possible serious infections and maternal antenatal corticosteroids (ACS) [16]. The assumed need for these interventions, likely coverage and expected prevalence is shown in the Appendix S2, Table S1 in Online Supplementary Document.
Coverage is defined as the number of individuals receiving an intervention or service (numerator), from among the population in need of the intervention or service (denominator). To date the main source of coverage and impact data in high-burden countries has been intermittent household surveys, including: Demographic Health Survey (DHS) and Multiple Indicator Cluster Survey (MICS) [18,19]. Currently monitored coverage indicators, including antenatal care, skilled birth attendance and postnatal care, mainly measure contact points with health care services but additional indicators are required to capture effective content of care [16,20,21]. Quality of care measurement requires definitions of characteristics for both provision (eg, safety, effectiveness, timeliness, equity, completeness) and experience of care (eg, client satisfaction) [22,23]. Household survey data accuracy depends first on the woman' s interpretation of what took place at the time and second on recalling and reporting this understanding up to five years after the event. Evidence suggests that household surveys do not always accurately capture either numerator or denominator for some treatment interventions, such as pneumonia in young children [24] and events during labour [25]. In addition, since measurement of newborns with complications occur only for a subset of births (3%-15%, see Appendix S2, Table S1 in the Online Supplementary Document), the sample size required is higher than possible in most national DHS. Consequently, not all desired maternal and newborn intervention coverage indicators specifically relating to content and quality of care, can be captured through household surveys [16,26].
Globally more than 75% of babies are now born in facilities, and local count data from routine registers is increasingly available [27]. Whilst health-facility data can be used to track coverage more frequently than surveys, previous studies have demonstrated mixed data quality [28][29][30]. Health workers recording the care they deliver face many barriers in documentation [31,32]. Capturing denominators through routine data are also a major challenge. Firstly, for indicators regarding interventions for the whole population, disaggregated by equity criteria, facility births are not the "true" population denominators. Given the lack of specific and appropriate denominator data, a national health management information system (HMIS) typically use census-based data for deriving forecasts and key population calculations [28]. Secondly, the challenge is magnified if the "true" denominator for the intervention is based on clinical need, so targeted at a proportion of the total population eg, requiring treatment for possible serious bacterial infection. Measurement of the "true" denominator requires consistent and objective measure of clinical need. Yet clinical judgement and decision making, even using evidence based algorithms, is often still subjective [33,34]. Live births are often used as a proxy denominator when it is challenging to define and measure the "true" denominator. A benchmark "target coverage level" is required when proxy denominators are used, because 100% coverage is only a target for a "true" denominator. For example, the "true" denominator for Caesarean Section rate is "women in need for Caesarean section". Because this is challenging to define and measure, the proxy denominator per 100 live births is used, but benchmarking a "target Caesarean Section rate" has proved complex [35][36][37][38][39]. Large inequity within countries and over-and under-provision occurring in parallel [40] highlight the problem of constructing useful indicators to measure and compare met need for complications. Therefore, an important focus of this study will be to compare various denominator options and, if using a proxy denominator to consider benchmarking.
The hierarchy of data needs (Figure 3) illustrates scope and granularity of data use decreases at higher levels of the health system [41]. At the point of service delivery, data are needed for individual clinical decisions and to measure the client' s perspective of care received. At facility level, aggregate data are collated to inform administrative and managerial decisions for planning and local quality improvement, mortality audit etc. At district level, data are required for planning (eg, human resources, equipment and drug availability). At national and global level, it is not possible or useful to collate all these data used at lower levels of the system. But it is crucial for accountability purposes to track a few core, standardised indicators to monitor SDGs and Universal Health Coverage at all levels -these "core indicators" are shown in the centre of the pyramid (Figure 3). WHO maintains a core list of 100 health indicators [42] and ENAP has prioritised 10 core indicators [9,16,17].
Improvements in civil and vital registration systems are enabling a more rapid transition to more timely denominator data on births and deaths. Data systems are transitioning to increasing use of HMIS to collect, collate, analyse and report routine data from health facilities up to district and national level. This has potential to be cost-efficient and generate more frequent coverage measurements [16,27]. Electronic HMIS platforms are increasingly being applied, offering great potential to harmonize traditionally fragmented information streams [43]. One such platform, the District Health Information System, version 2 (DHIS2) [44] is now being successfully implemented in >50 countries with high mortality burdens. Infrastructure and software development advances are currently driving a transition from predominately paper-based to mixed recording systems, even at clinical data level, ie, electronic patient records will increasingly be the basis of HMIS data in low-middle income country (LMIC) contexts.
Testing indicator validity is critical to improve measurement and inform decision makers of the likely accuracy of coverage collected by household survey and/or routine facility data [20]. Comparison of the reported indicator to an external data source "gold standard" is recommended [45]. Previous validation studies have mainly focused on population-based intervention coverage indicators for use in household surveys [25,[46][47][48]. Observational studies to determine accuracy of facility registers in high burden settings have typically focused on outcome indicators [29]. The EN-BIRTH study seeks to address current evidence gaps by testing validity of priority coverage indicators for newborn and maternal health, in facilities in three high burden country settings.

Aim
This paper is the protocol paper for the Every Newborn-Birth Indicators Research Tracking in Hospitals (EN-BIRTH) Study, which aims to test validity of selected newborn and maternal care health intervention indicators (coverage/ quality aspects and/or safety) in facilities ( Table 1). This study, as part of the Every Newborn Measurement Improvement Roadmap, and working closely with EPMM, aims to increase the evidence base to inform selection and use of maternal and newborn indicators in national HMIS (particularly DHIS2), and global tracking.

Research objectives
The research questions per objective, methods and analysis are detailed in Table 2.
Objective 1 -Numerators: To determine validity (accuracy) of both routine facility register and maternal recall surveys, compared to direct observation for selected maternal and newborn care interventions: uterotonics for 3rd stage labour, immediate breastfeeding, neonatal resuscitation, KMC; and, verification with patient case notes: neonatal infection management, and ACS administration ( Table 1).

Objective 2 -Denominators:
To compare different denominator options including proxies, and assess feasibility of their use in routine data platforms (Table 1)

Maternal recall survey (all six indicators)
Extraction from routine data sources -Assessment of content/quality of care for specific aspects related to each intervention with emphasis on timing

Objective 4 -Barriers and enablers
-Are some indicators recorded more completely than others?
-Has routine recording changed during the time of the study?
-What are the barriers and enablers to measurement of these indicators?
-What are the barriers and enablers to perceived use of data regarding these indicators?
-How can facility recording and flow of information into DHIS2 for these indicators be improved? Objective 3 -Content /quality of care: To evaluate different domains of coverage (eg, timing, completion rates, safety) for selected interventions ( Table 3).

Objective 4 -Barriers and enablers:
To evaluate barriers and enablers to routine recording of selected indicators, and to explore perceived utility of these data to improve decision-making, coverage and quality of care at all levels.

Uterotonic
Proportion of mothers who received oxytocin within recommended one minute after birth Immediate breastfeeding Proportion of babies whose breastfeeding was initiated within one hour of birth Resuscitation Proportion of non-breathing babies who had bag-and-mask initiated within one minute of birth Kangaroo mother care Proportion of babies receiving KMC, held in skin-to-skin position for 18 h or more, during the last 24 h Neonatal infection Proportion of cases with presumed sepsis, treated with antibiotics and for whom a blood culture result was available Antenatal corticosteroids Proportion of preterm labour cases who received antenatal corticosteroids according to WHO criteria for safety

Study design
The EN-BIRTH study uses quantitative and qualitative methods across four objectives ( Table 2). The validity of coverage indicators of selected maternal and newborn interventions as measured by routine facility registers and maternal recall surveys will be assessed by comparison with the "gold standard" of direct observation (Figure 4,  delivery ward, operation theatre, and KMC ward/area) by trained clinical observers. Data will be extracted from facility registers and verification of inpatient records carried out for newborns who received antibiotics for presumed infection, and for women who received ACS. Interviews to capture maternal recall will be conducted prior to discharge with all women whose births and/or their newborn' s care were observed or case notes were verified. In addition, barriers and enablers to recording of selected indicators in routine facility registers will be evaluated. Data flow into national HMIS platforms and perceived utility of data will be documented.
Research questions were informed by consultation with many Every Newborn stakeholders [9,17] including WHO-led Measurement Improvement Roadmap meeting [15] and EN-BIRTH Expert Advisory Group (listed as author group). More than 60 participants in an EN-BIRTH study design workshop [49] provided representation from country partners, national stakeholders, UN agencies, leading academic and professional experts in the field, governmental and non-governmental organisations, clinicians, program managers, other key experts and donors (see Appendix S1 in Online Supplementary Document) and contributed to development of the research protocol (Box 1).

Study populations
Inclusion / exclusion criteria for consenting women according to data collection methods ( Figure 5) are: • Observation on labour and delivery, operating theatre: All admitted women in active labour excluding those likely to deliver immediately. Women with a prior diagnosis of intrauterine death, were also excluded to avoid further maternal distress. • Observation KMC ward/area: All in-born and out-born neonates admitted for KMC.

VIEWPOINTS
• Verification from inpatient records for ACS administration: All women being observed and reported to be <34 weeks' gestation at admission from Expected Date of Delivery (EDD).
• Verification from inpatient records for neonatal infection cases: All babies < 28 days old with a main diagnosis of infection (sepsis/meningitis) recorded in neonatal register or admission/discharge book. Babies will be excluded for major congenital abnormality, neonatal encephalopathy/ severe asphyxia, <32 weeks' gestation and/or admission weight <1500 grammes.
• Maternal recall survey: All women whose birth and/or their newborn' s KMC will be observed, or case notes verified for ACS or neonatal infection.
• Routine register extraction: All women whose birth and/or their newborn' s KMC will be observed.

Sample size
Sample size was based on planned analysis for validity in objective one, by assuming 50% sensitivity ±10% precision, 50% specificity ±10% precision, with α = 0.05 and then applying the lowest previously published rates for neonatal resuscitation [51] and for KMC initiation [52,53]. Since formative data suggested >80% coverage for uterotonic administration, this indicator will be well-powered (see Appendix S2, Tables S2-3 in Online Supplementary Document). Hence minimal sample size is 4850 observations in each country, increased to 5390 observations to allow for a non-consent rate of 10% ( Table 4).
As expected prevalence of ACS is less than 0.5%, the resulting very large sample size was not feasible for this study [54,55]. The 5390 observations will be collected from three countries. In Tanzania and Nepal, each facility will observe this number of births, and in Bangladesh observations will take place in two facilities (Table 4) [4, 5,56]. We anticipate a total >20 000 observed births aiming to capture at least 106 observations per intervention per country, except for ACS (Table 4 and Appendix S2, Table S3 in Online Supplementary Document).

Tool development
A formative research phase was undertaken from July -December 2016 including: health facility assessments [57], register reviews, data flow assessments, and interviews/focus group discussions (FGDs) with women, caregivers, health workers and senior facility-level staff. The results helped ensure study sites could meet inclusion criteria, achieve required sample size and informed refinement of observer checklists and data collection processes. Maternal Recall survey tools were translated into local languages and back-translated.

Data collection software application
The development of a customised tablet-based software application (Android-based) for data collection and monitoring was undertaken by the icddr,b team supported by LSHTM (Figure 5 and Figure 6) [58]. The software application has different permissions for various data collector cadres (observation, verification, maternal recall survey, and data extraction) and translated into local languages where relevant. Time-stamped data will be collected using this EN-BIRTH data collection software, stored locally on the tablet, and synchronised regularly to the local central secure database server.

Training of data collectors and supervisors
Data collector cadres include: tracker (responsible for consent, registration and assigning for observation/record verification and subsequent tracking); observer (direct observational data for assigned women and babies); interviewer (maternal recall survey interviews); data verifier/extractor (data from facility registers or case notes); and supervisor (responsible for all data collectors and quality assurance) ( Figure  5). Observers with a clinical background (eg, nurses) will be recruited. Data collection staff will receive two weeks of training using classroom-based sessions, group activities and mock data collection within the health facility, detailed in the Data Collectors Training Handbook [58]. Observer training will include guidance on response to specific events, including managing maternal distress and when to pause data collection and assist in the care of the patient, if they perceive facility staff are responding inappropriately to a life-threatening situation. A minimum individual post-training assessment score of ≥80% is required before data collection can commence.

Observation (Objectives 1, 2, and 3)
Informed written consent will be obtained prior to study registration and basic demographic data collected ( Figure 5) by the tracker. Verbal consent will be obtained from the health workers. Observers working in Labour/Delivery ward, Operating theatre and KMC ward/areas will collect direct clinical observation data. These observers will not interact with participating pregnant women, her family members or attending health workers during observation (except to respond to a life-threatening event [58]).
Observations on Labour/Delivery ward will focus on specific aspects of: 1 st , 2 nd and 3 rd stage of labour, postpartum haemorrhage, immediate newborn care and neonatal resuscitation. Multiple parameters will be recorded to assess content/quality of care, particularly related to intervention timing. KMC observations will focus on domains of initiation, position, feeding and other treatment administered. Mother and baby outcome at discharge from hospital will be documented [58]. Additionally in Nepal for neonatal resuscitation, observation video film recording and physiological assessment will be undertaken. Information regarding these additional processes will be provided separately to women and informed, signed consent taken [59]. Video cameras and pulse oximeters will be placed on resuscitation tables within Labour/Delivery ward and Operating theatres and research staff trained in this equipment operation and maintenance. A trained data collector will complete the observation checklist for resuscitation using the recorded video within 24 hours of birth [60,61]. If consent is subsequently withdrawn for video use, this data will be excluded, and the video deleted.

Verification using inpatient notes (Objectives 1, 2, and 3)
During the formative phase it was recognised that direct observation was not feasible for two of the selected interventions (neonatal infection and antenatal corticosteroids). For these interventions, data verifiers will use patient charts/ case notes, drug charts, laboratory reports and other relevant routine documentation to verify intervention and quality of care measurements. Supervisors will review/search for any missing or illegible documents before confirming data not readable/ not recorded [58].

Maternal Recall Survey (Objectives 1, 2, and 3)
Data collectors will interview mothers whose baby' s birth or treatment is observed and/or verified prior to discharge from postnatal or KMC ward/areas The software programming of the structured questionnaires will automatically skip certain questions to minimise any risk of further emotional trauma if the mother has experienced a stillborn or neonatal death [58]. For multiple births the interview will be completed only for first-born babies. Consent will be repeated before this interview in recognition that the mother may have been in labour when she first consented to participation in this research. Consent will also be taken for repeat maternal recall surveys at different intervals after discharge, if funded for follow-up.

Routine register data extraction (Objectives 1, 2 and 3)
Data extractors will use routine labour/delivery registers, KMC registers and neonatal ward registers to extract participant data recorded by facility staff. If data are illegible or cannot be found, supervisors will review/search for these documents, before documenting data not readable/not recorded [58].

Assess barriers and enablers (Objective 4)
Mixed methods will be used to identify barriers and enablers to routine data recording and use of selected indicators ( Table 1). Completeness and quality of existing documentation in routine registers (labour/ delivery, KMC and/or neonatal) for 12 months prior to the study will be evaluated. In Bangladesh and Nepal, 100% of cases in these registers will be extracted. In the Tanzanian facilities, with a high number of births, a 20% sample randomly selected will be used for labour/delivery cases with 100% for KMC and neonatal infection cases. Qualitative data collection tools for FGD, in-depth and key informant interviews will be informed by the MEASURE Evaluation Performance of Routine Information System Management (PRISM) conceptual framework and tools [62], including constructs for Technical, Organizational and Behavioural factors. Data will be collected from study data collectors and facility health workers. Data flow assessments will provide information on movement of data from registers, into DHIS2 and up to national level. Additionally, perceptions regarding indicators which are considered most valuable and most feasible to collect will be explored through interviews with policy makers and technical managers of DHIS2.

Data quality monitoring
The EN-BIRTH data collection software includes skip rules, and consistency checks as well as pre-defined value ranges for some variables. Progress will be monitored by an online data dashboard, providing real-time summary tables per site, including data capture cascade for selected coverage indicators at each step; registration, consent, observation/ verification, maternal recall survey and register data extraction. A traffic light system will indicate overall progress for each indicator using pre-defined thresholds. Bi-weekly all-site calls will provide an opportunity for country teams to review and discuss progress using these data dashboards, in addition to promoting collaborative quality improvement initiatives between countries and sites.
As part of the quality assurance process, for approximately 5% of cases in each site, simultaneous supervisor observation and duplicate data verification and extraction will also be conducted using EN-BIRTH data collection software. The supervisor data will be regarded as the standard, stored in a separate database, and variability between individual data collectors estimated by calculating inter-rater reliability using Cohen' s kappa (κ) coefficient. Minimum agreement levels of ≥71% for observation and ≥91% for data extraction/case verification will be used [63].

Data management
EN-BIRTH tablet data will be synchronised, and uploaded to an in-country central server, regularly backed-up. Raw data will be encrypted, and access restricted to country data manager who will anonymise data before data sets are pooled. Server maintenance, data management, and cleaning will be coordinated according to agreed protocols including logical and completeness checks. A unified variable code book will contain description of variable names and answer options. Qualitative data will be digitally recorded, transcribed, and translated into English. All data will be stored on password-protected computers.

Analysis plan
Analyses will be coordinated, using a standard approach, both combining sites, and with site-specific and/ or country-specific analyses. An overview of research objectives, main research questions and data analysis approach are summarised in Table 2. Quantitative analyses will be undertaken with Stata 15 (Stata Statistical Software: Release 15).

Objective 1 -Numerator
The "gold standard" used for comparison will be direct observation of selected interventions by research observer, except for neonatal infection and ACS, where in-patient note verification will be used. Data extracted from facility routine register records and data collected during maternal recall survey will be compared with this "gold standard" separately (Figure 4, panel A). Accuracy of each individual coverage indicator will be assessed by constructing two-by-two tables to analyse the sensitivity and positive-predictive value of routine data (Figure 4, panel B). Specificity of routine data will be assessed for those indicators with true negatives and confidence intervals will be computed. "Area Under the Curve" previously used for coverage indicators validation will be used for indicators with true negatives [25,[46][47][48]64].

Objective 2 -Denominators
Various denominator options (Table 1) will be compared using descriptive statistics to assess variation in estimated coverage and undertake analyses to guide benchmarking. Information on denominators will come from the EN-BIRTH data set, facility total birth data collected from facility reports, and population birth data from estimates based on census or survey and fertility rates, as used in DHIS2. For indicators with a whole population denominator (ie, uterotonics, breastfeeding) or a clearly measurable "true" de- nominator regarding clinical need (eg, KMC -birth weight <2000g), the inflation factor will be used. Inflation factor is the ratio of estimated routine recording-based prevalence to true (observed) population-based prevalence. It represents the magnitude of over-or under-estimation in the study setting relative to true population-based prevalence.

Objective 3 -Content/quality of care
Multiple recorded parameters will be analysed to assess measurement related to content/quality of care, particularly regarding timing of interventions and in relation to WHO Guideline recommendations (Table 3).

Objective 4 -Barriers and enablers
To assess barriers and enablers to indicator data recording and use, mixed methods will be used based on a framework adapted from PRISM [62] and considering other tools [65]. Quantitative analysis of routine register data collected prior to and during the study will address two research questions: (1) Are some indicators recorded more completely than others? (2) Has routine recording changed during the study time? Qualitative data from FGDs, in-depth and key informant interviews will be analysed using QSR International' s NVivo 12 qualitative software (NVivo qualitative data analysis Software; QSR International Pty Ltd Version 12.1, 2018). Predetermined codes will be applied by two independent researchers, data managed into units of information covering broad categories with grouping of relevant emerging themes of importance.

DISCUSSION
EN-BIRTH is the first large study to assess validity of newborn and maternal care indicators in routine data systems, doing so at very large scale (>20 000 observed births) across three countries with a high-burden of mortality. Previous maternal and newborn indicator validation studies have focused on testing the validity of women' s self-report method, used in population-based household surveys [25,[46][47][48]64]. Validation of facility registers have focussed on outcome measures [29]. The EN-BIRTH study seeks to validate both routine registers and maternal recall at discharge for coverage indicators of high impact interventions. The novel software developed for this research allows detailed and precise recording of events around the time of birth, and particularly the timing of interventions. There are many studies examining quality of care at birth [66,67], and this research is not repeating that, but is focused on accuracy of routine reporting of care.
This research responds to calls from country and programme leaders for guidance on indicators for maternal and newborn services, tracking progress towards meeting national targets and Universal Health Coverage [9,17,68]. The high reporting load for many countries with multiple programmes, donors, and indicators, may result in the so-called data rich, information poor (DRIP) syndrome [69]. In addition to high reporting burden on the system, the individual midwives and doctors are responsible for recording data in multiple registers and patient records, sometimes at the expense of providing respectful quality care for women and babies. Hence a shorter list of evidence-based, indicators is required for national tracking, taking in to account validity and utility in low-resource, high-burden settings. The results of this study will inform recommendations for indicators appropriate for uptake within HMIS, and may also identify some that are not appropriate for use at higher levels of the health system (Figure 3). This research will also help inform improved capture and quality of data in HMIS, and especially DHIS2.
During the MDG-era, population-level surveys were the most common data source in high-burden countries, but studies consistently demonstrate challenges with maternal recall data, especially regarding details of clinical interventions [24,25,46,47,64]. For data that require medical knowledge and especially events that women may not have closely witnessed (eg, neonatal resuscitation), we expect poor maternal recall, which may reflect the lack of information given to families experiencing complications. Given continued reliance on household surveys for demographic and health data in many remote or unstable settings, we anticipate the main value of our maternal recall survey validation findings will be to contribute to the understanding of which indicators are not suitable for use in household surveys. We anticipate that if the woman does not know about the intervention at discharge from hospital, then recall later will not be useful.
A strength of this study design is the rigorous assessment of validity at scale, of facility routine data by comparison with direct observation, defined here as the "gold standard". Another strength is a specific fo-VIEWPOINTS RESEARCH THEME 5: MEASURING COVERAGE OF ESSENTIAL MATERNAL AND NEWBORN CARE INTERVENTIONS: AN UNFINISHED AGENDA cus on the denominator challenge. In an era of Universal Health Coverage, with discussions surrounding scale-up of more complex care for targeted populations, the science of denominator measurement, use of proxies, and selection of benchmarks will be increasingly important. This challenge applies to denominator measurement for maternal and newborn complications (as well as other large burden conditions, notably non-communicable diseases). This study, however, is not designed to validate the denominator based on subjective assessment of clinical need (eg, requiring neonatal resuscitation). Hence, we will only be able to measure true negatives, calculate specificity, and undertake analysis of "area under the curve" for interventions with a total population or clearly defined denominator [25,[46][47][48]64].
This research also offers a unique opportunity to examine quality of care data from >20 000 births and assess to what extent we can accurately capture specific components including content and timing of selected interventions. Although multiple specific aspects of care may be measured locally to drive quality of care improvement at facility level, here we will focus on quality of care indicators that may be useful at district or national levels of the health system. Timing of interventions is a critical marker of quality of care, since delays are a matter of life or death: a woman may die in hours, a baby in minutes. Moreover, the sequence of interventions is complex and even concurrent (eg, how often is the correct dose of uterotonic given <1 minute after birth to prevent a woman bleeding from postpartum haemorrhage; How soon is bag-and-mask ventilation initiated for a baby who is not breathing; How many hours each day is a baby kept in KMC position). The time-stamped design of EN-BIRTH data collection software will permit analysis of such sequences.
Whilst direct observation is considered the "gold standard", data collectors might miss interventions, with concurrent actions at birth, especially in an emergency. We will limit potential recording bias by using observers with health backgrounds who are familiar with the procedures under observation [70][71][72]. EN-BIRTH data will also be directly on the tablet software to allow fast data capture. The study also presents several ethical challenges including the dilemma of observing a life-threatening situation without appropriate response from facility staff, and gaining informed consent during labour [58]. The clinically trained observers will have underlying familiarity of hospital environments, experience to uphold study protocols correctly [70] and experience in maintaining participant confidentiality. Training and processes will be put in place to take account of professional and legal duty of care.
The "Hawthorne effect" describes the phenomenon when a research participant' s behavior is altered as a consequence of being studied or observed, and can be a source of bias in observational research [73]. Within this study, it is possible that clinical observers' presence will influence health workers to change their approach to care and routine register data. However, there is some evidence to suggest that sustained contact with participants (as with this study) may mitigate altered behaviors in health care settings [74]. To assess this bias, we will analyze changes in register data completeness and quality before and during the study.
Although the EN-BIRTH study is not powered to validate an ACS administration indicator, this will be included. Current WHO guidelines provide strong recommendation for the provision of a single course of ACS for any woman at risk of imminent preterm birth (24-34 weeks of gestation) provided the following criteria are met: 1) accurate assessment of gestational age; 2) no evidence of maternal infection; 3) preterm birth is considered imminent; 4) available adequate childbirth and newborn care services [75]. EN-BIRTH study sites were assessed in accordance with these WHO guidelines. The Antenatal Corticosteroid Trial (ACT) evaluated use of ACS at lower levels of the health system, with half of study births in home settings and care often provided by traditional birth attendants [76]. ACT reported an adverse outcome risk particularly in cases where ACS administration was after 34 weeks and outlines important challenges for measurement of gestational age, and assessment of maternal infection. This demonstrated need for robust data and further evidence in such settings, along with the imperative of ensuring safety and effectiveness, make measurement of ACS coverage and outcomes essential. Therefore, the EN-BIRTH study ACS analysis will focus on assessing relevant documentation to report the current ACS administration practice, compared with WHO safety criteria [75].
Given the importance of the neonatal period in terms of risk and prevention of long-term adverse child development outcomes, we plan a five-year follow-up for EN-BIRTH study recruited children who received basic neonatal interventions [77]. The Every Newborn -Simplified Measurement Integrating Longitudinal Neurodevelopment & Growth (EN-SMILING) aims to detect child development outcomes as early as possible for referral to services, and to improve routine measurement of child development outcomes in programme settings. The EN-BIRTH study is richer through active involvement of experts and policymakers from the EN-BIRTH Expert Advisory Group, Every Newborn implementation community, EPMM, UN Agencies including WHO, UNICEF and UNFPA as well as many partners and donors. In further support of this goal, each of the three countries have National Advisory Committees who will actively participate in the research process and support uptake of findings. Results will also be published in peer reviewed journals and disseminated with all relevant audiences. Following EN-BIRTH study validity testing, an important next step will be to evaluate feasibility of a short-list of indicators at different levels of the health system.
Most of the 5.5 million deaths around the time of birth [3] still occur in settings with the least data. Household surveys remain a key data source in the poorest countries, and Every Newborn is also involved in a multi-site study, EN-INDEPTH, to assess and improve these data [78]. Data improvement is fundamental for monitoring more rapid progress towards meeting global and national mortality targets, and in achieving Universal Health Coverage for all women and newborns [15]. With ongoing investment in electronic data platforms (including DHIS2) and increasing country demand for evidence-based indicators, we anticipate that these results will advance availability and use of data to change coverage, quality and equity, to help end preventable maternal and newborn mortality, as well as stillbirths. Finally, and most importantly we thank the women, their families, the health workers and the hundreds of data collectors involved in the EN-BIRTH study. Ethics and consent to participate: This study was granted ethical approval by institutional review boards in all operating counties including the London School of Hygiene & Tropical Medicine (Appendix S3 of Online Supplementary Document).
Availability of data and material: All collaborating partners have signed data sharing and transfer agreements. Funding: The Children' s Investment Fund Foundation (CIFF) are the main funder of this research which is administered via The London School of Hygiene & Tropical Medicine. The Swedish Research Council specifically funded the Nepal site through UNICEF and Lifeline Nepal. The main funding for the Windsor research design workshop was provided by CIFF and in addition, the United States Agency for International Development, Saving Newborn Lives/Save the Children, WHO and Bill & Melinda Gates Foundation through the United States Fund for UNICEF funded many participants' travel and accommodation for attendance. Authorship contributions: The study was conceptualized by JEL in 2014, and the initial protocol was coordinated by HR with JEL during 2016, with inputs from SC, SM, HB, the EN-BIRTH advisory group (names listed above), and during a multi-stakeholder Windsor research design workshop (names listed above). From mid-2016 VSG, GGL, DB and AB, with LTD joining in 2017. GGL and HR led inputs to observation checklists and led development of training materials with country coordinators (Bangladesh, TT and AER, Nepal, NR, Tanzania, NS). DB led development of the maternal recall survey and coordinated the health facility assessments. Each of the three country research teams input to all the data collection tools and review processes. The iccdr,b team from Bangladesh (notably AER, TT, TH, QSR, SA and SBZ) led the development of the software application, data dashboards and database development with VSG and the LSHTM team. iccdr,b (AER) also led the development of the verification form for infection case management and the data variable dictionary. The IHI and MUHAS in Tanzania hosted the study implementation workshop, input to data collection tools and training materials and are leading work on objective 4 (barriers and enablers for data collection and use). The Nepal team input to data collection tools and training materials, and led on use of video filming. The manuscript was drafted by HR, JEL and LTD with further review of the analysis sections by VSG and SC, in addition to major inputs particularly from GM, AER and TT. All authors reviewed and helped to revise the manuscript. Competing interests: The authors completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available upon request from the corresponding author), and declare no conflicts of interest.